AUC optimization for deep learning-based voice activity detection

نویسندگان

چکیده

Abstract Voice activity detection (VAD) based on deep neural networks (DNN) have demonstrated good performance in adverse acoustic environments. Current DNN-based VAD optimizes a surrogate function, e.g., minimum cross-entropy or squared error, at given decision threshold. However, usually works on-the-fly with dynamic threshold, and the receiver operating characteristic (ROC) curve is global evaluation metric for all possible thresholds. In this paper, we propose to maximize area under ROC (MaxAUC) by DNN, which can of terms entire curve. objective AUC maximization nondifferentiable. To overcome difficulty, relax nondifferentiable loss function two differentiable approximation functions—sigmoid hinge loss. study effectiveness proposed MaxAUC-DNN VAD, take either standard feedforward network bidirectional long short-term memory as DNN model state-of-the-art multi-resolution cochleagram Fourier transform feature. We conducted noise-independent training comparison methods. Experimental results show that taking optimization higher than common objectives error cross-entropy. The experimental conclusion consistent across different structures, features, noise scenarios, sets, languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

A Hybrid Optimization Algorithm for Learning Deep Models

Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...

متن کامل

A Hybrid Optimization Algorithm for Learning Deep Models

Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...

متن کامل

Transfer Learning for Voice Activity Detection: A Denoising Deep Neural Network Perspective

Mismatching problem between the source and target noisy corpora severely hinder the practical use of the machine-learningbased voice activity detection (VAD). In this paper, we try to address this problem in the transfer learning prospective. Transfer learning tries to find a common learning machine or a common feature subspace that is shared by both the source corpus and the target corpus. The...

متن کامل

Melanoma detection with a deep learning model

Background: Skin cancer is one of the most common forms of cancer in the world and melanoma is the deadliest type of skin cancer. Both melanoma and melanocytic nevi begin in melanocytes (cells that produce melanin). However, melanocytic nevi are benign whereas melanoma is malignant. This work proposes a deep learning model for classification of these two lesions.    Methods: In this analytic s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Eurasip Journal on Audio, Speech, and Music Processing

سال: 2022

ISSN: ['1687-4722', '1687-4714']

DOI: https://doi.org/10.1186/s13636-022-00260-9